Exploiting Data-Independence for Fast Belief-Propagation

نویسندگان

  • Julian J. McAuley
  • Tibério S. Caetano
چکیده

MAP-inference in graphical models requires that we maximize the sum of two terms: a datadependent term, encoding the conditional likelihood of a certain labeling given an observation, and a data-independent term, encoding some prior on labelings. Often, the data-dependent factors contain fewer latent variables than the data-independent factors. We note that MAPinference in any such graphical model can be made substantially faster by appropriately preprocessing its data-independent terms. Our main result is to show that message-passing in any such pairwise model has an expected-case exponent of only 1.5 on the number of states per node, leading to significantly faster algorithms than the standard quadratic time solution. ‘Data-Independence’ MAP-inference in a graphical model G consists of solving an optimization problem of the form ŷ = argmax y ∑ C∈C ΦC(yC), where C is the set of cliques in the model. Often, the model can be further factorized if we make a distinction between the latent variables y and the observation x: ŷ(x) = argmax y ∑ F∈F ΦF (yF |xF ) } {{ } data dependent + ∑ C∈C ΦC(yC) } {{ } data independent . We say that those cliques containing only latent variables are data-independent. In many models, those cliques that contain an observed variable contain fewer latent variables than the purely latent cliques, i.e., each F ∈ F is a proper subset of some C ∈ C. Examples of such models are shown at top-right. Example Models Examples of graphical models to which our results apply: cliques containing observations have fewer latent variables than purely latent cliques. In other words, cliques containing a grey node encode the data likelihood, whereas cliques containing only white nodes encode priors. We focus on cases where the gray nodes have degree one (i.e., they are connected to only one white node). In such cases we obtain an Ω( √ N) speedup on the number of states per node. Message-Passing In these models, message passing between two cliques A = (i, j), B = (j, k) takes the form mA→B(yi) = Ψi(yi)+max yj Ψj(yj)+Φi,j(yi, yj), (1) which is equivalent to matrix-vector multiplication in the max-sum semiring. In a recent paper [1], we showed that matrix-matrix multiplication in this semiring takes O(N) (for N × N matrices). In our current work, we note that a similar result can be applied to matrix-vector multiplication, so long as the matrix is known in advance. Since the ‘matrix’ in the above equation simply encodes a prior, it can be preprocessed offline. How it Works Step 1:  6 2 14 16 9 7 12 8 10 3 11 13 1 15 4 5 99 92 87 81 78 66 53 46 30 26 21 16 12 10 8 6 3 4 8 11 7 16 13 9 6 2 15 10 12 5 1 14 98 93 85 76 71 70 67 65 63 57 48 42 39 37 26 17 don't search past this line Step 2:  6 2 14 16 9 7 12 8 10 3 11 13 1 15 4 5 99 92 87 81 78 66 53 46 30 26 21 16 12 10 8 6 3 4 8 11 7 16 13 9 6 2 15 10 12 5 1 14 98 93 85 76 71 70 67 65 63 57 48 42 39 37 26 17 Step 3:  6 2 14 16 9 7 12 8 10 3 11 13 1 15 4 5 99 92 87 81 78 66 53 46 30 26 21 16 12 10 8 6 3 4 8 11 7 16 13 9 6 2 15 10 12 5 1 14 98 93 85 76 71 70 67 65 63 57 48 42 39 37 26 17 Step 4:  6 2 14 16 9 7 12 8 10 3 11 13 1 15 4 5 99 92 87 81 78 66 53 46 30 26 21 16 12 10 8 6 3 4 8 11 7 16 13 9 6 2 15 10 12 5 1 14 98 93 85 76 71 70 67 65 63 57 48 42 39 37 26 17 Step 5:  6 2 14 16 9 7 12 8 10 3 11 13 1 15 4 5 99 92 87 81 78 66 53 46 30 26 21 16 12 10 8 6 3 4 8 11 7 16 13 9 6 2 15 10 12 5 1 14 98 93 85 76 71 70 67 65 63 57 48 42 39 37 26 17 We wish to compute maxi va[i] + vb[i]. Arrows connect corresponding elements of va and vb, as sorted by pa and pb. We draw a red line connecting the leftmost arrowheads that have been seen so far. Any ‘arrows’ whose tail lies to the right of this line cannot possibly correspond to an optimal solution. Experiments 0 100 200 300 400 500 N (number of states) 0 10 20 30 40 50 N um be ro fa dd iti on s Number of online operations per message entry naı̈ve method our method 2 √ N 2×∑bN/2c m=0 (N−m)!(N−m)! (N−2m)!N ! 0 100 200 300 400 500 N (number of states) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 To ta lw al lt im e (s ec on ds ) Random potentials (2500 node chain) naı̈ve method 0.00002N (r = 0.00514) our method 0.00002N (r = 0.00891) 0 50

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

In-Network Nonparametric Loopy Belief Propagation on Sensor Networks for Ad-Hoc Localization

Sensor Networks provide a cheap, unobtrusive, and easy-to-deploy method for gathering large quantities of data from an environment. While this data is often noisy, we can compensate by exploiting spatial correlation. This paper proposes the use of the statistical inference method of Loopy Belief Propagation (LBP) to exploit this correlation structure in the context of a well-examined problem in...

متن کامل

Independence of Causal In uence and Clique Tree Propagation

This paper explores the role of independence of causal in uence (ICI) in Bayesian network inference. ICI allows one to factorize a conditional probability table into smaller pieces. We describe a method for exploiting the factorization in clique tree propagation (CTP) | the state-of-the-art exact inference algorithm for Bayesian networks. We also present empirical results showing that the resul...

متن کامل

MapReduce Lifting for Belief Propagation

Judging by the increasing impact of machine learning on large-scale data analysis in the last decade, one can anticipate a substantial growth in diversity of the machine learning applications for “big data” over the next decade. This exciting new opportunity, however, also raises many challenges. One of them is scaling inference within and training of graphical models. Typical ways to address t...

متن کامل

High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains

In this paper, we present a novel framework incorporating a combination of sparse models in different domains. We posit the observed data as generated from a linear combination of a sparse Gaussian Markov model (with a sparse precision matrix) and a sparse Gaussian independence model (with a sparse covariance matrix). We provide efficient methods for decomposition of the data into two domains, ...

متن کامل

Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

Discrete Fourier transforms and other related Fourier methods have been practically implementable due to the fast Fourier transform (FFT). However there are many situations where doing fast Fourier transforms without complete data would be desirable. In this paper it is recognised that formulating the FFT algorithm as a belief network allows suitable priors to be set for the Fourier coefficient...

متن کامل

Online Belief Propagation for Topic Modeling

Not only can online topic modeling algorithms extract topics from big data streams with constant memory requirements, but also can detect topic shifts as the data stream flows. Fast convergence speed is a desired property for batch learning topic models such as latent Dirichlet allocation (LDA), which can further facilitate developing fast online topic modeling algorithms for big data streams. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010